class: center, middle, inverse, title-slide .title[ # The Whiz and Viz Bang of Data ] .subtitle[ ## The Basics of Visualizaiton and Modeling ] .author[ ### Dr. Christopher Kenaley ] .institute[ ### Boston College ] .date[ ### 2024/9/16 ] --- class: inverse, top # In class today <!-- Add icon library --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.14.0/css/all.min.css"> .pull-left[ Today we'll .... - Review/Learn about the visualization, model choice, and phylogenetic correction - Look at some models - Choose which models fit best - Peak under the hood of Module Project 3 Next time . . . - Account for phylogenetic history ] .pull-right[ ![](https://miro.medium.com/max/1200/0*MSmfUESNp4eSzNy_) ] --- class: inverse, top <!-- slide 1 --> ## What is a model? - a mathematical explanations of a process or system - Predictions in R: `y~x` - but can me more complex: * `y~x+a` * `y~x+a+b` * `y~x+a+b+c` * etc. - Linear model: `lm(y~x)` * But could be some other model --- class: inverse, top <!-- slide 1 --> ## What is a model? ``` r set.seed(123) x.A=1:50 y.A=x.A*2+runif(50,1,200) x.B=1:50 y.B=x.B*3.5+runif(50,1,200) d <- tibble(x=c(x.A,x.B),y=c(y.A,y.B),species=c(rep("A",50),rep("B",50))) d%>% ggplot(aes(x,y,col=species))+geom_point()+geom_smooth(method="lm") ``` ``` ## `geom_smooth()` using formula = 'y ~ x' ``` ![](3140_f24_9-16_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- class: inverse, top <!-- slide 1 --> ## Are models accurate descriptions of the process/system? ``` r spec.lm1 <- lm(y~x+species,data=d) anova(spec.lm1) ``` ``` ## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## x 1 103506 103506 29.5261 4.099e-07 *** ## species 1 22023 22023 6.2823 0.01386 * ## Residuals 97 340040 3506 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- class: inverse, top <!-- slide 1 --> ## Are models accurate descriptions of the process/system? ``` r summary(spec.lm1) ``` ``` ## ## Call: ## lm(formula = y ~ x + species, data = d) ## ## Residuals: ## Min 1Q Median 3Q Max ## -116.94 -47.00 -3.31 50.33 115.69 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 98.6482 13.4004 7.362 5.94e-11 *** ## x 2.2294 0.4103 5.434 4.10e-07 *** ## speciesB 29.6803 11.8416 2.506 0.0139 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 59.21 on 97 degrees of freedom ## Multiple R-squared: 0.2696, Adjusted R-squared: 0.2546 ## F-statistic: 17.9 on 2 and 97 DF, p-value: 2.41e-07 ``` --- class: inverse, top <!-- slide 1 --> ## Are models accurate descriptions of the process/system? ## Information theory .pull-left[ ``` r spec.lm2 <- lm(y~x*species,d) anova(spec.lm2) ``` ``` ## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## x 1 103506 103506 32.631 1.247e-07 *** ## species 1 22023 22023 6.943 0.009812 ** ## x:species 1 35530 35530 11.201 0.001168 ** ## Residuals 96 304510 3172 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- class: inverse, top <!-- slide 1 --> ## Are models accurate descriptions of the process/system? ## Information theory .pull-left[ ``` r AIC(spec.lm1,spec.lm2) ``` ``` ## df AIC ## spec.lm1 4 1104.953 ## spec.lm2 5 1095.917 ``` ![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*qzNtzGi7HyVXmxrfOcuaww.png) ] .pull-right[ ![](https://timeseriesreasoning.files.wordpress.com/2021/06/a6352-1nurn_wtjfpwin0mc6t7myq.png) ]